NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Eliciting Confidence for Improving Crowdsourced Audio Annotations

https://doi.org/10.1145/3512935

Méndez Méndez, Ana Elisa; Cartwright, Mark; Bello, Juan Pablo; Nov, Oded (March 2022, Proceedings of the ACM on Human-Computer Interaction)

In this work we explore confidence elicitation methods for crowdsourcing "soft" labels, e.g., probability estimates, to reduce the annotation costs for domains with ambiguous data. Machine learning research has shown that such "soft" labels are more informative and can reduce the data requirements when training supervised machine learning models. By reducing the number of required labels, we can reduce the costs of slow annotation processes such as audio annotation. In our experiments we evaluated three confidence elicitation methods: 1) "No Confidence" elicitation, 2) "Simple Confidence" elicitation, and 3) "Betting" mechanism for confidence elicitation, at both individual (i.e., per participant) and aggregate (i.e., crowd) levels. In addition, we evaluated the interaction between confidence elicitation methods, annotation types (binary, probability, and z-score derived probability), and "soft" versus "hard" (i.e., binarized) aggregate labels. Our results show that both confidence elicitation mechanisms result in higher annotation quality than the "No Confidence" mechanism for binary annotations at both participant and recording levels. In addition, when aggregating labels at the recording level, results indicate that we can achieve comparable results to those with 10-participant aggregate annotations using fewer annotators if we aggregate "soft" labels instead of "hard" labels. These results suggest that for binary audio annotation using a confidence elicitation mechanism and aggregating continuous labels we can obtain higher annotation quality, more informative labels, with quality differences more pronounced with fewer participants. Finally, we propose a way of integrating these confidence elicitation methods into a two-stage, multi-label annotation pipeline.
more » « less
Full Text Available
Urban Rhapsody: Large‐scale exploration of urban soundscapes

https://doi.org/10.1111/cgf.14534

Rulff, Joao; Miranda, Fabio; Hosseini, Maryam; Lage, Marcos; Cartwright, Mark; Dove, Graham; Bello, Juan; Silva, Claudio T. (June 2022, Computer Graphics Forum)

Full Text Available
Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes

https://doi.org/10.1109/WASPAA52581.2021.9632767

Cramer, Aurora; Cartwright, Mark; Pishdadian, Fatemeh; Bello, Juan Pablo (October 2021, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA))

Full Text Available
Specialized Embedding Approximation for Edge Intelligence: A Case Study in Urban Sound Classification

https://doi.org/10.1109/ICASSP39728.2021.9414287

Srivastava, Sangeeta; Roy, Dhrubojyoti; Cartwright, Mark; Bello, Juan P.; Arora, Anish (June 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
null (Ed.)
Embedding models that encode semantic information into low-dimensional vector representations are useful in various machine learning tasks with limited training data. However, these models are typically too large to support inference in small edge devices, which motivates training of smaller yet comparably predictive student embedding models through knowledge distillation (KD). While knowledge distillation traditionally uses the teacher’s original training dataset to train the student, we hypothesize that using a dataset similar to the student’s target domain allows for better compression and training efficiency for the said domain, at the cost of reduced generality across other (non-pertinent) domains. Hence, we introduce Specialized Embedding Approximation (SEA) to train a student featurizer to approximate the teacher’s embedding manifold for a given target domain. We demonstrate the feasibility of SEA in the context of acoustic event classification for urban noise monitoring and show that leveraging a dataset related to this target domain not only improves the baseline performance of the original embedding model but also yields competitive students with >1 order of magnitude lesser storage and activation memory. We further investigate the impact of using random and informed sampling techniques for dimensionality reduction in SEA.
more » « less
Full Text Available
Per-Channel Energy Normalization: Why and How

https://doi.org/10.1109/LSP.2018.2878620

Lostanlen, Vincent; Salamon, Justin; Cartwright, Mark; McFee, Brian; Farnsworth, Andrew; Kelling, Steve; Bello, Juan Pablo (January 2019, IEEE Signal Processing Letters)

Full Text Available
Scaper: A library for soundscape synthesis and augmentation

https://doi.org/10.1109/WASPAA.2017.8170052

Salamon, Justin; MacConnell, Duncan; Cartwright, Mark; Li, Peter; Bello, Juan Pablo (October 2017, Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-17))

Sound event detection (SED) in environmental recordings is a key topic of research in machine listening, with applications in noise monitoring for smart cities, self-driving cars, surveillance, bioa-coustic monitoring, and indexing of large multimedia collections. Developing new solutions for SED often relies on the availability of strongly labeled audio recordings, where the annotation includes the onset, offset and source of every event. Generating such precise annotations manually is very time consuming, and as a result existing datasets for SED with strong labels are scarce and limited in size. To address this issue, we present Scaper, an open-source library for soundscape synthesis and augmentation. Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, 'specification'. To increase the variability of the output, Scaper supports the application of audio transformations such as pitch shifting and time stretching individually to every event. To illustrate the potential of the library, we generate a dataset of 10,000 sound-scapes and use it to compare the performance of two state-of-The-Art algorithms, including a breakdown by soundscape characteristics. We also describe how Scaper was used to generate audio stimuli for an audio labeling crowdsourcing experiment, and conclude with a discussion of Scaper's limitations and potential applications.
more » « less
Full Text Available

Search for: All records